Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
目的:为全身CT设计多疾病分类扫描使用自动提取标签从放射科文reports.Materials和方法三个不同的器官系统:这项回顾性研究共有12,092例患者(平均年龄57 + - 18; 6172名妇女)包括对模型开发和测试(2012-2017自)。基于规则的算法被用来从12,092患者提取13667身体CT扫描19,225疾病的标签。使用三维DenseVNet,三个器官系统是分段的:肺和胸膜;肝胆;和肾脏及输尿管。对于每个器官,三维卷积神经网络分类没有明显的疾病与四种常见疾病为跨越所有三个模型总共15个不同的标签。测试是在相对于2875个手动导出的参考标签2158个CT体积的子集从2133名患者( - ; 1079名妇女18,平均年龄58 +)进行。性能报告为曲线(AUC)与通过方法德朗95%置信区间下接收器的操作特性的区域。结果:提取的标签说明书验证确认91%横跨15个不同的唱片公司99%的准确率。对于肺和胸膜标签的AUC分别为:肺不张0.77(95%CI:0.74,0.81),结节0.65(0.61,0.69),肺气肿0.89(0.86,0.92),积液0.97(0.96,0.98),并且没有明显的疾病0.89( 0.87,0.91)。对于肝和胆囊的AUC分别为:肝胆钙化0.62(95%CI:0.56,0.67),病变0.73(0.69,0.77),扩张0.87(0.84,0.90),脂肪0.89(0.86,0.92),并且没有明显的疾病0.82( 0.78,0.85)。对于肾脏及输尿管的AUC分别为:石0.83(95%CI:0.79,0.87),萎缩0.92(0.89,0.94),病变0.68(0.64,0.72),囊肿0.70(0.66,0.73),并且没有明显的疾病0.79(0.75 ,0.83)。结论:弱监督深度学习模型能够在多器官系统不同的疾病分类。
translated by 谷歌翻译
As various city agencies and mobility operators navigate toward innovative mobility solutions, there is a need for strategic flexibility in well-timed investment decisions in the design and timing of mobility service regions, i.e. cast as "real options" (RO). This problem becomes increasingly challenging with multiple interacting RO in such investments. We propose a scalable machine learning based RO framework for multi-period sequential service region design & timing problem for mobility-on-demand services, framed as a Markov decision process with non-stationary stochastic variables. A value function approximation policy from literature uses multi-option least squares Monte Carlo simulation to get a policy value for a set of interdependent investment decisions as deferral options (CR policy). The goal is to determine the optimal selection and timing of a set of zones to include in a service region. However, prior work required explicit enumeration of all possible sequences of investments. To address the combinatorial complexity of such enumeration, we propose a new variant "deep" RO policy using an efficient recurrent neural network (RNN) based ML method (CR-RNN policy) to sample sequences to forego the need for enumeration, making network design & timing policy tractable for large scale implementation. Experiments on multiple service region scenarios in New York City (NYC) shows the proposed policy substantially reduces the overall computational cost (time reduction for RO evaluation of > 90% of total investment sequences is achieved), with zero to near-zero gap compared to the benchmark. A case study of sequential service region design for expansion of MoD services in Brooklyn, NYC show that using the CR-RNN policy to determine optimal RO investment strategy yields a similar performance (0.5% within CR policy value) with significantly reduced computation time (about 5.4 times faster).
translated by 谷歌翻译
紧急车辆(EMV)在应对城市地区的医疗紧急情况和火灾爆发等时间关键电话方面起着至关重要的作用。现有的EMV调度方法通常会根据历史流量数据数据和设计流量信号相应地优化路线;但是,我们仍然缺乏一种系统的方法来解决EMV路由和流量信号控制之间的耦合。在本文中,我们提出了EMVLIGHT,这是一个分散的加固学习(RL)框架,用于联合动态EMV路由和交通信号的先发制人。我们采用具有政策共享和空间折现因子的多代理优势行为者 - 批评方法。该框架通过多级RL代理的创新设计和新型的基于压力的奖励功能来解决EMV导航和交通信号控制之间的耦合。拟议的方法使EMVLIGHT能够学习网络级的合作交通信号相阶段阶段策略,这些策略不仅减少EMV旅行时间,而且还缩短了非EMV的旅行时间。基于仿真的实验表明,EMVLIGHT可使EMV旅行时间减少$ 42.6 \%$,以及与现有方法相比,$ 23.5 \%$短的平均旅行时间。
translated by 谷歌翻译
我们回顾了有关模型的文献,这些文献试图解释具有金钱回报的正常形式游戏所描述的社交互动中的人类行为。我们首先涵盖社会和道德偏好。然后,我们专注于日益增长的研究,表明人们对描述行动的语言做出反应,尤其是在激活道德问题时。最后,我们认为行为经济学正处于向基于语言的偏好转变的范式中,这将需要探索新的模型和实验设置。
translated by 谷歌翻译
我们为大脑和行为提供了一般的理论框架,这些框架是进化的和计算方式。我们抽象模型中的大脑是一个节点和边缘网络。虽然它与标准神经网络模型有一些相似之处,但随着我们所示,存在一些显着差异。我们网络中的节点和边缘都具有权重和激活级别。它们充当使用一组相对简单的规则来确定激活级别和权重的概率传感器,以通过输入,生成输出,并相互影响。我们表明这些简单的规则能够实现允许网络代表越来越复杂的知识的学习过程,并同时充当促进规划,决策和行为执行的计算设备。通过指定网络的先天(遗传)组件,我们展示了进化如何以初始的自适应规则和目标赋予网络,然后通过学习来丰富。我们展示了网络的开发结构(这决定了大脑可以做些什么以及如何良好)受影响数据输入分布的机制和确定学习参数的机制之间的共同进化协调的批判性影响(在程序中使用按节点和边缘运行)。最后,我们考虑了模型如何占了学习领域的各种调查结果,如何解决思想和行为的一些挑战性问题,例如与设定目标和自我控制相关的问题,以及它如何帮助理解一些认知障碍。
translated by 谷歌翻译
广义结构方程模型(GSEM)[Peters和Halpern 2021],作为名称表明,结构方程模型(SEM)的概括。他们可以在不同的许多变量中处理(以及其他物种,这对于捕获动态系统至关重要。我们在GSEM中提供了一种声音和完整的Aximatizing,即哈珀[2000]为SEM提供的声音和完整的公理化的延伸。考虑到GSEM有助于澄清Halpern的公理捕获的属性。
translated by 谷歌翻译
结构方程式模型(SEM)可能是用于建模因果关系的最常用的框架。然而,正如我们所示,天真地将该框架延伸到无限的多个变量,例如,要为模型动态系统而导入几个问题。我们介绍GSEMS(广义SEM),灵活的SEM直接指定干预结果,其中(1)微分方程的系统可以以自然和直观的方式表示,(2)某些自然情况,不能由SEM表示,可以轻松表示,(3)SEM中实际因果关系的定义基本上没有变化。
translated by 谷歌翻译
开普勒和苔丝任务产生了超过100,000个潜在的传输信号,必须处理,以便创建行星候选的目录。在过去几年中,使用机器学习越来越感兴趣,以分析这些数据以寻找新的外延网。与现有的机器学习作品不同,exoMiner,建议的深度学习分类器在这项工作中,模仿域专家如何检查诊断测试以VET传输信号。 exoMiner是一种高度准确,可说明的和强大的分类器,其中1)允许我们验证来自桅杆开口存档的301个新的外延网,而2)是足够的,足以应用于诸如正在进行的苔丝任务的任务中应用。我们进行了广泛的实验研究,以验证exoMiner在不同分类和排名指标方面比现有的传输信号分类器更可靠,准确。例如,对于固定精度值为99%,exoMiner检索测试集中的93.6%的所有外产网(即,召回= 0.936),而最佳现有分类器的速率为76.3%。此外,exoMiner的模块化设计有利于其解释性。我们介绍了一个简单的解释性框架,提供了具有反馈的专家,为什么exoMiner将运输信号分类为特定类标签(例如,行星候选人或不是行星候选人)。
translated by 谷歌翻译
In this paper, we propose a novel technique, namely INVALIDATOR, to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. INVALIDATOR reasons about program semantic via program invariants while it also captures program syntax via language semantic learned from large code corpus using the pre-trained language model. Given a buggy program and the developer-patched program, INVALIDATOR infers likely invariants on both programs. Then, INVALIDATOR determines that a APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains errors behaviors of the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of INVALIDATOR is three-fold. First, INVALIDATOR is able to leverage both semantic and syntactic reasoning to enhance its discriminant capability. Second, INVALIDATOR does not require new test cases to be generated but instead only relies on the current test suite and uses invariant inference to generalize the behaviors of a program. Third, INVALIDATOR is fully automated. We have conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that INVALIDATOR correctly classified 79% overfitting patches, accounting for 23% more overfitting patches being detected by the best baseline. INVALIDATOR also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.
translated by 谷歌翻译